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ABSTRACT.  An  explicit  understanding  of  the  opportunity  for 
constructing  new  algorithms  out  of  existing  (or  supposedly  existing) 
algorithms  is  presented. 

Say  that  B,  A^ ,  A2 ,  ...  are  problems.  We  present  an  abstract 
setting  that  provides  for  the  effective  use  of  algorithms  for  problems 
A^ ,  A2,...  for  the  design  of  efficient  algorithms  for  problem  B.  A 
notion  of  "lucid  boxes"  which  is  an  extention  of  "black  boxes"  is 
introduced  for  this  purpose. 


a 


1.  Introduction 

Every  existing  algorithm  represents  some  knowledge  that  has  been 
acquired.  It  is  the  responsibility  of  theoreticians  in  the  field  to 
explore  every  opportunity  to  utilize  the  knowledge  accumulated  so  far 
for  the  future  design  of  algorithms.  Often  it  is  easier  to  use 
existing  efficient  algorithms  for  another  problem  in  related  models  of 
computation  rather  than  designing  a  completely  new  algorithm. 

For  later  in  the  introduction  we  need  the  following.  It  is  well 
known  that  sequential  execution  of  a  computer  program,  say  P,  on  some 
input  I  can  be  described  similarly  to  a  proof  in  mathematical  logic. 
Associate  a  line  with  each  step  of  this  execution.  A  line  that 
corresponds  to  step  s  (>  1)  includes  the  sequence  V  of  all  input  and 
program  (including  output)  variables  and  their  contents  at  the 
beginning  of  the  step.  Next  to  it,  specify  the  (atomic)  instruction 
(with  respect  to  a  machine  or  a  programming  language)  which  is  executed 
in  step  s.  The  sequence  V  at  the  beginning  of  step  s+1  is  written  in 
the  next  line  which  is  associatd  with  this  step. 

A  specific  goal  of  the  present  paper  is  to  give  a  framework  for 
the  design  of  efficient  algorithms  which  is  simple,  uniform  and 
general.  The  contribution  of  the  paper  is  in  outlining  a  methodology 
which  encompasses  two  opportunities:  (1)  a  direct  design  of  a  procedure 
from  atomic  instructions,  and  (2)  a  build-up  of  a  composed  procedure 
from  a  sequence  of  given  procedures.  While,  for  direct  design  we 
suggest  using  known  ways;  we  introduce  some  ideas  for  composition  of 
new  procedures  from  other  (specified  or  not  fully  specified) 
procedures.  It  will  be  evident  that  other  known  ways  for  composition 
of  procedures  can  be  derived  efficiently  from  our  framework.  Our 
framework  takes  full  advantage  of  the  line-by-line  execution 
description  of  existing  programs  given  above.  Therefore,  we  coin  the 
name  lucid  boxes  to  the  way  in  which  existing  programs  are  used.  This 
is  in  sharp  contrast  to  the  concept  of  'black  boxes',  where  only 
predeclared  outputs  of  existing  programs  are  transparent  to  procedures 
which  are  composed  out  of  these  existing  procedures. 

Examples,  for  which  the  full  computational  power  of  our  framework 
is  necessary  are  presented.  Actually,  some  of  our  more  powerful 
examples   are  when  we   need   to   compose   a   new  procedure   out   of 
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hypothetically  existing  procedures.  We  demonstrate  the  relevance  of 
our  framework  for  the  design  of  efficient  sequential,  parallel  and 
distributed  algorithms  or  for  proving  theorems  about  their  existence. 
Such  typical  theorems  assert  the  performance  evaluation  of  new 
algorithms  as  functions  of  the  performance  parameters  of  other 
algorithms. 

We  show  that  this  framework  encompasses  also  known  techniques  for 
reducibilities  among  problems.  The  terms  'reducibilty '  and 
'composition'  are  used  to  refer  to  the  same  operation.  The  use  of 
either  term  relates  to  the  significance  of  the  result  rather  than  to 
the  result  itself. 

It  is  interesting  to  note  that  we  actually  suggest  an  answer  to  a 
problem  suggested  implicitly  by  Aho,  Hopcroft,  and  Ullraan  [AHU] .  They 
give  a  non-standard  definition  of  NP-completeness  ("A  language  Lq  is 
NP-complete  if  the  following  condition  is  satisfied:  If  we  are  given  a 
deterministic  algorithm  of  time  complexity  T(n)>n  to  recognize  Lg,  then 
for  every  every  language  L  in  NP  we  can  effectively  (underlined  by  the 
author)  find  a  deterministic  algorithm  of  time  complexity  TCp^ (n)), 
where  p-j^  is  a  polynomial  that  depends  on  L").  They  do  not  specify  in 
the  text  what  is  meant  by  an  'effective'  use  of  an  algorithm  for  the 
purpose  of  obtaining  another  algorithm.  We  propose  an  explicit 
understanding  of  the  term  'effectively'  in  this  definition  of 
NP-corapleteness.   (Not  for  polynomial  time  reducibilities  only.) 

We  would  like  to  say  that  this  paper  does  not  belong  to  the 
conventional  field  of  Programming  Languages.  Parts  of  our 
presentation,  however,  relate  to  this  field  similarly  to  the  way  in 
which  the  high-level  language  Pidgin  ALGOL  ([AHU])  does. 

The  simple  example  that  follows  points  at  unsatisfactory  features 
of  the  commonly  used  "black-box-techniques"  for  compositions  of 
procedures.  It  also  enables  the  reader  to  embody  some  of  the  more 
formal  definitions  of  the  next  section. 

An  introductory  example 

A  similar  example  to  the  one  which  is  presented  below  was  given  in 
[Meg2]   for   other  purposes.   More  on  the  family  of  examples  which  is 


represented  by  this  example  can  be  found  in  Appendix  III.   I  find  this 
example  both  simple  and  intriguing. 

Let  f ^(X  )  =  a^  +  Xb^  ,  i  =  l,...,n,  be  pairwise  distinct 
increasing  functions  of  X  (h^  >  0).  For  every  X,  let  F(X)  denote  the 
median  of  the  set  {  f  ^(X) , .  . .  yf^(X)}  .  Obviously,  F(X  )  is  a  piecewise 
linear  monotone  increasing  function  with  0(n  )  break  points.  Given  X, 
F(X )  can  be  evaluated  in  0(n)  time  [AHU]  (once  the  f.(X)'s  have  been 
computed).  The  parametrized  median-problem:  Solve  the  equation  F(X)  = 
0.  One  possible  way  of  solving  this  problem  is  to  first  identify  the 
set  of  intersection  points  I  =  (Xj  ^  /  a^  +  ^ii^i  ~  ^  i  "*"  ^ii^i  ^^  ^  J^^  • 
Every  breakpoint  of  F  is  equal  to  one  of  the  X. .'s.  We  can  thus  search 
the  set  I  U  {^,-l«}  for  two  values  X^,  X^  such  that  F(X^)  <  0  <  FCX^) 
and  such   that   there  is  no  X- .  in  the  open  interval  (X  ,X  ).   Once  X 


linear  over  [X  ,X  ] .  The  search  for  such  an  interval  requires  O(log  n) 
F-evaluations  and  (if  we  do  not  wish  to  presort  the  set  of  X-  .'s) 
finding  medians  of  subsets  of  the  set  of  X-  .'s  whose  cardinalities  are 


which   is   dominated  by   the  evaluation  of  all  the  intersection  points 
X,.  . 

An  altrnative  appraoch  is  as  follows.  Let  us  start  the  evaluation 
of  F(X)  with  X  not  specified.  Assume  that  each  of  the  branching 
decisions  of  this  n-number  median  computation  is  based  on  a  comparison 
between  two  input  numbers  only.  Denote  the  solution  of  the  equation 
F(X)  =  0  (which  is  of  course  not  known  at  this  point)  by  X  .  The 
outcome  of  the  first  comparison,  performed  by  the  algorithm  for 
evaluating  F,  depends  of  course  on  the  numerical  value  of  X. 
Specifically,  if  our  median-finding  algorithm  for  evaluating  F  starts 
with  comparing  fj^(X)  and  f  2(X  )  then  the  intersection  point  of  these  two 
lines  (X  ^2)  is  a  critical  value  for  this  comparison;  namely,  for  X  > 
^12  '  fl(^)  >  f2(^)  while  for  X  <  X^2  »  ^l^^)  <  ^2^^)'  ^^  ^^*^^  versa. 
Thus,  we  may  find  X  ^2  »  evaluate  F(Xj^2)>  ^^^  '^hen  decide  whether  X  > 
Xj^2  o"^  ^  *"  ^12  according  to  the  sign  of  F(X]^2)'  ^^  can  then  proceed 
with  the  evaluation  of  F(X),  where  X  is  still  not  specified  but  now 
restricted  either  to  (^,X^2]  or  to  [X^2»°°)'  "^^^  same  idea  is 
repeatedly   applied  at  the  following  points  of  comparison.   In  general. 
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when  we  need  to  compare  f.  with  f •  and  X  is  currently  restricted  to  an 
interval  [X',X"],  then  if  X.  does  not  lie  in  the  interval  then  the 
outcmoe  of  the  comparison  is  uniform  over  the  interval;  otherwise,  by 
evaluating  F(X,.),  we  can  restrict  our  interval  either  to  [X',Xj^]  or 
to  [Xij.X"]. 

By  the  correctness  of  the  median  algorithm  it  can  readily  be  seen 
that  we  finally  restricted  ourselves  to  an  interval  in  which  F(X)  is 
linear. 

Since  0(n)  such  comparisons  are  being  performed  in  the  know 
median-finding  algorithm  and  each  may  require  an  evaluation  of  F  (which 
amounts  to  one  median-finding),  it  follows  that  such  an  algorithm  runs 
in  0(n  )  time.  Note  that  [Meg2]  mentions  a  linear  time  algorithm  for 
this  problem  that  uses  a  straightforward  technique. 

The  connection  to  the  earlier  discussion  is  as  follows.  We  want 
to  solve  the  parametrized  median  problem.  For  this  we  use  a  median 
algorithm  in  a  comparison  model  of  computation,  as  was  specified  above. 
It  should  be  evident  that  we  actually  make  a  very  strong  use  of 
information  which  is  available  from  the  aforementioned  line-by-line 
description  of  an  execution.  Namely,  before  each  comparison  we  stop 
the  actual  execution,  make  some  computations  (that  use  the  parameters 
of  the  comparison)  on  the  side;  and  then  proceed  (from  the  same  point) 
with  the  execution  of  the  median  algorithm  by  "feeding"  the  variables 
that  have  to  be  compared  so  as  to  affect  properly  the  result  of  the 
comparison.  So,  we  can  say  that  steps  of  the  execution  of  the  median 
algorithm  were  transparent  to  us  and  we  could  intervene  in  this 
execution  by  suspending  it  and  changing  contents  of  variables.  The 
suspension  part  is  similar  to  the  notion  of  coroutines  in  the  sense 
that  whenever  a  coroutine  is  activated,  it  resumes  execution  of  its 
program  at  the  point  where  the  action  was  last  suspended  (see  [K]).  We 
presented  this  example  for  a  simple  demonstration  of  the  following: 

•  there  is  information  which  is  inherent  in  the  execution  process  of 
any  "reasonable"  existing  algorithm; 

this  information  is  not  contained  in  usual  output  specifications  of 
algorithms ; 

•  it  is  advantageous  to  use  this  information  as  soon  as  it  is 
available  rather  than  run  the  existing  algorithm  to  the  end. 


Remark.  Knuth  [K]  presents  examples  where  coroutines  are  used  for 
elegant  and  concise  (simulation  of)  system  modeling.  However, 
regarding  the  use  of  coroutines  for  algorithms  Knuth  asserts  that  "It 
is  rather  difficult  to  find  short,  simple  examples  of  coroutines  which 
illustrate  the  importance  of  the  idea".  It  seems  that  such  an  example 
has  been  found  here.  Of  course,  any  median-finding  algorithm  can  be 
modified  to  solve  the  parametrized  median  problem.  But  if  we  insist  on 
'locking  its  mechanism'  and  operating  only  on  its  output  then  the 
coroutine  notion  seems  to  imply  better  running  time  than  the  subroutine 
notion.  The  reason  for  this  is  as  follows.  Suppose  that  we  define  all 
intermediate  results  and  the  complete  execution  to  belong  to  the 
output.  We  still  need  to  perform  the  median  algorithm  all  the  way  from 
the  beginning  in  order  to  resume  operation  from  the  point  which  it  was 
actually  aborted  in  the  previous  call. 

In  the  next  section  we  demonstrate  a  precise  characterization  of 
the  lucid-box  composition  technique  in  a  random-access-machine  (RAM) 
environment.  Section  3  relates  the  technique  to  known  concepts  of 
reducibilities  and  discusses  implications  it  might  have.  Section  4 
examines  a  possible  structured-programming  description  of  applications 
of  the  lucid-box  technique.  Appendices  I, II,  and  III  exemplify  the  use 
of  this  technique  for  efficient  distributed,  parallel  and  serial 
computation. 

2 .  Luc id -box  Compositions  of  Procedures. 

We  try  to  give  a  self-contained  presentation.  However,  %n  a  few 
cases  the  reader  will  be  referred  to  [AHU] .  (It  is  referenced  as  'the 
book'  in  some  places.)  As  a  mean  of  presentation  we  specify  places  in 
the  book  where  'patches'  are  proposed.  The  reader  is  assumed  to  be 
familiar  with  the  contents  of  sections  1.2,  1.3,  1.4  and  1.8  where  the 
random  access  machine  (RAM),  the  random-access-stored-program-machine 
(RASP)  and  Pidgin  ALGOL  are  introduced.  Unlike  conventional 
programming  languages  Pidgin  ALGOL  programs  should  be  read  by  a  human 
reader  rather  than  a  machine.  Pidgin  ALGOL  permits  a  succinct 
presentation  of  algorithms  in  the  book.  A  Pidgin  ALGOL  program  can  be 
translated   into  a  RAM  or  RASP  program  in  a  straightforward  manner.   It 


is  necessary  to  consider  time  and  space  required  to  execute  the  code 
corresponding  to  a  Pidgin  ALGOL  statement  or  a  RAM  or  RASP.  Later  we 
say  how  to  extend  Pidgin  ALGOL  in  order  to  include  lucid-box 
composition  of  procedures.  We  start  by  presenting  a  machine  which  is 
slightly  different  from  both  the  RAM  and  RASP.  The  changes  relative  to 
the  RAM  in  the  definition  of  this  machine  reflect  proposed  changes  in 
the  definition  of  the  extended  Pidgin  ALGOL  with  respect  to  Pidgin 
ALGOL;  thereby,  implying  how  to  translate  statements  of  the  extended 
Pidgin  ALGOL  into  this  machine.  We  call  this  new  machine 
random-access-meraory-and-program  machine  (RAMP).  The  only  difference 
between  the  RAMP  and  the  known  RAM  is  that  the  program  is  located  in  a 
read-only  memory.  Recall  that  the  indirect  read  instruction  'READ  *i ' 
means:  "copy  the  register  whose  number  is  the  content  of  register  i 
into  the  accumulator".  The  'READ  *i '  instruction  of  our  machine  may 
refer  also  to  locations  (registers)  in  the  program  part  of  the  machine. 
(The  instructions  are  encoded  by  integers.  For  an  example  of  a  similar 
encoding  see  the  presentation  of  the  RASP  in  the  book. )  The  book  shows 
that  the  order  of  time  (and  space)  complexities  of  the  RAM  and  RASP  are 
the  same  for  the  same  algorithm  if  the  cost  of  instructions  is  either 
uniform  or  logarithmic.  No  new  ideas  are  required  in  order  to 
establish  similar  relationships  between  the  RAM  (or  RASP)  and  the  RAMP. 

Let  PpP2,...  be  existing  programs  which  are  written  for  a  RAMP. 
We  present  a  framework  for  specifying  a  new  program  P  by  using  these 
programs.  We  do  it  in  two  stages.  First,  we  specify  P  for  a  model  of 
computation  which  contains  the  RAMP.  Later,  we  show  a  possible  way  of 
translating  P  into  the  RAMP.   This  may  clarify  the  choice  of  the  RAMP. 

The  model  of  computation  for  which  P  is  defined  contains  the  RAMP 
in  the  following  sense.  It  employs  a  sequence  of  RAMPs  Rq.R^,...  . 
The  main  program  for  P  is  located  in  Rg.  This  program  may  be 
constructed  out  of  the  usual  (optionally)  labeled  RAMP  instructions 
with  respect  to  Rq.  In  addition  the  RAMPs  RpR2,...  are  attached  to 
Rq  in  a  "slave-master"  relation.  Any  one  of  PpP2,...  may  be  run  on 
each  of  these  RAMPs  in  the  usual  way  with  one  exception: 

The  RAMP  R^  ,  i  >  1 ,  has  a  distinguished  additional  square  called 
"the  dominator  of  Rj    "  (d(Rj_)  for  short)  that  enables  Rq  to  control  its 


operation.  P^^^  is  changed  so  that  the  following  pair  of  instructions 
must  be  accessed  before  every  access  to  any  of  its  instructions. 

1.  Enter  'red'  into  d(R^). 

2.  Proceed  only  if  d(R^)  contains  'green'.  (Only  Rq  may  enter  'green' 
into  d(Rj.)  and  R^   has  to  wait  until  Rq  does  so.) 

The  program  for  Rq  may  also  have  instructions  for: 

1.  Peforming  all  the  RAMP  instructions  with  respect  to  any  of  the 
input,  memory  or  output  squares  of  each  R-  (with  respect  to  the 
accumulator  of  Rq). 

2.  Starting  any  P.  on  any  R^. 

3.  Entering  'green'  into  d(Rj.),  i  >  1. 

4.  Reading  the  P.  instructions  and  esspecially  the  next  instruction  to 
be  executed  into  the  accumulator  of  R. 

We  have  already  made  our  point  in  the  previous  section  regarding 
how  to  relate  this  composition  scheme  to  our  introductory  example.  Our 
procedure  starts  the  median  algorithm  on  R,  .  Whenever  a  comparison 
between  two  (copies  of)  inputs  of  the  median  algorithm  is  going  to  take 
place,  our  procedure:  does  the  following.  It  'suspends'  the  median 
algorithm,  makes  all  the  side  computations  required,  determines  the 
result  of  the  comparison  by  assigning  fictitious  values  to  the 
corresponding  locations  of  R^  and  resumes  the  operation  of  the  median 
algorithm. 

Let  us  overview  a  possible  way  for  mapping  our  composed  procedure 
into  a  single  RAMP  denoted  S. 

1.  Proper  versions  of  the  main  programs  for  P  and  the  programs 
PpP2,...   are  located  in  the  program  part  of  S. 

2.  An  easy  way  to  compute  mapping  from  input,  memory,  output  and 
location-counter  locations  of  R,,R2,...  and  memory  and 
location-counter  locations  of  Rq  ,  into  the  memory  of  S  has  to  be 
defined.  Such  a  mapping  has  to  use  as  little  memory  space  of  S  as 
possible.  We  present  one  possible  mapping.  It  seems  to  be  especially 
appropriate  for  the  (difficult)  case  where  the  number  of  RAMP-s  which 
are  actually  employed  (denoted  by  n)  and  the  maximum  size  of  a  RAMP's 
memory  over  all  their  uses  (denoted  by  m)  are  not  known  in  advance. 
Some   easier   cases  may  readily  result  in  more  efficient  mappings.   Let 


the  serial  number  N(i,j)  of  location  i  of  RAMP  R.  (i  >  1,  j  >  0)  be  the 
cardinality  of  the  set 

{  a,k)  I  (a  +  k  <  i  +  j)  OR  (£  +  k  =  i  +  j  AND  k  <  i)) 
AND  (a    >    I    AND  k   >    0} 
(Intuitively,  the  array  of  pairs  (Jl,k),  Z    >    1  and  k  >  0,  is  sorted  in  a 
lexicographic  order;  first  by  the  (finite  length)  diagonal  that  crosses 
a  pair  and  second  by  the  serial  number  of  this  pair  on  this  diagonal). 
So,   we  map  location  i  of  RAMP  R.  into  memory  location  N(i,j)+c  of  RAMP 
S,  where  c  is  some  constant.   This  guarantees   that  S  uses  0((n-hii)*^) 
memory  locations. 

3.  The  transition  of  control,  which  is  done  by  updating  the  d(R£)-s, 
can  be  readily  simulated.  For  this,  'activate'  the  location  of  S  which 
corresponds  to  the  location-counter  of  R^  . 

It  should  be  clear  that  if  a  number  of  steps  in  the  'source' 
composed  procedure  is  T,  then  the  number  of  steps  in  the  translation 
into  the  single  RAMP  is  0(T). 

Let  us  go  back  to  the  extended  Pidgin  ALGOL.  Procedures  can  now 
be  invoked  by  either  calling  them  for  the  first  time  or  resuming  their 
operation.  Their  complete  execution  (in  the  way  it  is  described  in  the 
introduction)  can  be  used. 

The  appendices  supply  more  examples  where  this  power  of 
composition  of  procedures  is  useful.  The  reader  is  advised  to  read  the 
appendices  at  this  stage.  [V2]  contains  examples  for  applications  of 
the  lucid-box  composition  technique  in  synchronous  parallel  and 
distributed  computation.  One  of  these  examples  is  summarized  in  the 
first  appendix.  Appendix  II  reviews  another  application  for 
synchronous  distibuted  computation  taken  from  [VI].  A  lucid-box 
composition  that  uses  sorting  and  merging  networks  is  given.  Appendix 
III  includes  references  for  further  applications  in  parametrized 
computing.  An  application  in  asynchronous  distributed  computation  can 
be  found  in  [V3] .  In  order  to  keep  this  presentation  within  reasonable 
length  we  avoid  saying  more  about  it. 


3.  Relations  with  Other  Notions  of  Compositic 
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We  first  refer  to  two  reducibilities  which  are  commonly  used. 
Definitions  of  widely  used  terms  are  sometimes  omitted.  They  can  be 
found  in  [GJ] .  The  most  popular  technique  in  the  literature  for 
showing  that  an  algorithm  for  one  decision  problem  can  be  used  for  a 
solution  of  another  decision  problem  uses  transformation 
reducibilities.  That  is,  a  constructive  transformation  that  maps  any 
instance  of  the  first  problem  into  an  equivalent  instance  of  the  second 
is  given.  Such  a  transformation  enables  us  to  convert  any  algorithm 
for  the  second  problem  into  a  corresponding  algorithm  for  the  first 
problem.  The  well  established  notion  of  composition  of  functions 
suggested  the  composition  of  a  function  on  an  existing  one,  thereby 
creating  a  new  function.  This  is  somewhat  similar  to  the  way 
transf ormtaion  reducibilities  are  defined.  However,  the  fact  that  an 
execution  of  an  existing  algorithm  may  include  much  more  information 
than  its  output  is  actually  ignored  by  the  transformation  reducibility. 
On  the  other  hand,  the  extensive  applicability  of  this  reducibilty 
suggests  that  in  spite  of  its  narrowness  it  often  focuses  on  the  right 
things. 

The  known  generalization  of  transformation  reducibility  (see  [GJ]  ) 
strengthens  our  point  of  similarity  to  composition  of  functions  even 
more.  A  Turing  reduction  from  one  (search  or  decision)  problem  to 
another  is  an  algorithm  that  solves  the  first  problem  by  using  a 
hypothetical  subroutine  for  the  second  problem.  This  subroutine  can  be 
called  more  than  once.  Each  time  the  subrouine  is  called  it  operates 
(from  the  point  of  view  of  the  algorithm)  as  the  function  it  realizes; 
i.e.  ,  the  input  for  an  application  of  this  subroutine  is  written  by  the 
algorithm  in  some  specified  memory  location  and  then  the  subroutine 
responds  by  writing  the  output  in  some  specified  memory  location.  A 
similar  notion  of  operation  is  sometimes  referred  to  as  'black  boxes'. 
Polynomial  time  transformation  reducibility  is  often  called 
"Karp-reducibility"  while  polynomial  time  Turing  reducibility  is  often 
called  "Cook-reducibility".  See  Section  5.2  in  [GJ]  for  a  history  of 
terminology. 

We  already  implied  that  we  can  alternate  freely  between 
"compositions"  and  "reducibilties".  Thus,  we  can  use  lucid-box 
compositions  for  reducibilities  between  procedures.    Since  we   called 
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the composition  "lucid-box  composition"  we  call  the   reducibility 
"lucid-box  reducibility". 

Transformation  and  Turing  reducibilities  are  instances  of 
lucid-box  reducibilities  since  they  permit  us  to  define  reducibilities 
among  sets  of  solutions  for  problems.  Unlike  these  reducibilities 
lucid-box  reducibilities  permit  us: 

(1)  To  restrict  sets  of  solutions. 

In  our  examples  we  demonstrated  how  useful  it  might  be  to  restrict 
ourselves  to  comparison  models  of  computation,  or  to  force  the  copying 
assumption  or  to  deal  with  comparison  networks  only. 

(2)  To  use  the  full  information  in  the  execution  of  procedures  and  not 
only  their  output. 

(3)  To  relate  closely  the  resource  complexity  of  the  new  procedure  to 
the  resource  complexity  of  the  procedures  which  are  used  by  the 
lucid-box  reducibility. 

Our  examples  demonstrate  the  applicability  of  lucid-box  compositions  or 
reducibilities  for  either  very  efficient  algorithms  or  whenever 
multi-parameter  complexity  optimization  is  required.  As  we  already 
implied  in  the  appendices  multi-parameter  optimization  is  typical  to 
parallel  and  distributed  computation  environments,  where  there  is  need 
to  optimize  simultanously:  time,  sizes  of  local  memories,  communication 
load  on  many  lines,  etc.. 

A  natural  question  to  be  asked  is  about  polynomial  time  lucid-box 
reducibilities;  and,  in  particular,  do  they  affect  the  theory  of 
NP-completeness?  Unlike  the  design  of  efficient  algorithms,  where  new 
opportunities  are  opened,  we  show  in  the  remainder  of  this  section  that 
the  answer  to  this  question  is  essentially  negative.  The  argument,  as 
we  shall  see,  is  simple.  It  is  based  on  the  following  observation. 
Given  input  variables  and  a  program  that  operates  on  them  it  is  an 
arbitrary  (semantical)  decision  which  program  variables  are  declared  to 
be  outputs. 

Extending  the  notion  of  lucid-box  reducibility  to  Turing  machines 
and  the  related  definitions  of  NP-completeness  is  straightforward  and 
therefore  omitted.   In  the  introduction  we  presented  the  definition   of 
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[AHU]  for  NP-completeness.  Obviously,  every  problem  which  is 
NP-complete  using  Cook's  reducibility,  is  NP-complete  using  [AHU]'s 
reducibility.  [Ev]  implies  that  it  is  an  open  question  if  the  other 
direction  holds  as  well.  The  following  sentence  relates  to  the 
definition  of  NP-completeness  of  [AHU] ,  given  above.  Note  that  both 
the  program  for  recognizing  Lq  and  its  execution  are  transparent  to  us 
in  a  lucid-box  reducibility  from  the  problem  of  recognizing  L  to  the 
problem  of  recognizing  Lq  .  This  seems  to  imply  that  our  definition  of 
NP-completeness  is  equivalent  to  [AHU] ,  since  nothing  can  stop  us  from 
using  in  any  "effective"  way  the  program  for  recognizing  Lg. 

We  would  like  to  elaborate  a  little  on  terminology  which  is  being 
used.  [GJ]  defines  a  searcn  problem  it  to  consist  of  a  set  D  of  finite 
objects  called  instances  and,  for  each  instance,  I  e  D  ,  a  set  S  (I) 
of  finite  objects  called  solutions  for  I.  An  algorithm  is  said  to  solve 
a  search  problem  tt  if,  given  as  input  any  instance  I  e  D  ,  it  returns 
the  answer  "no"  whenever  S  (I)  is  empty  and  otherwise  returns  some 
solution  s  belonging  to  S  (I).  Define,  alternatively,  a  procedural 
solution  to  IT  to  consist  of  an  algorithm  that  solves  it  including  all 
its  intermediate  computations.  Assume  that  instead  of  algorithms  that 
solve  search  or  decision  problems  we  were  interested  in  procedural 
solutions  of  these  problems.  By  definition,  any  call  to  a  procedural 
solution  for  some  problem  results  in  a  listing  of  all  its  intermediate 
computations  throughout  the  T  time  units  during  which  the  procedural 
solution  ran  (for  some  T).  Right  after  the  example  in  the 
introduction,  we  remarked  that  a  coroutine  that  runs  in  time  0(T)  can 
be  simulated  by  a  subroutine  that  runs  in  time  O(T^)  by  restarting  it 
instead  of  resuming  its  operation  from  the  place  it  was  stopped. 

Therefore,  if  in  a  reduction  of  the  procedural  solutions  of  one 
problem  to  the  procedural  solutions  of  another  problem  we  used  only 
executions  of  solutions  to  the  second  problem  we  can  simulate  it  by  a 
polynomial  time  Turing  reduction.  However,  the  (bizzare)  possibility 
of  using  the  program  for  the  second  problem  in  other  ways  than 
executing  it  still  leaves  the  question  of  [Ev]  "formally"  open. 

Remark.  It  might  be  interesting  to  define  an  "algebra  of  procedures" 
as  follows.   Its  elements  (the  procedures)  will  be  pairs  which  consists 
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of  a  program  and  input  and  program  variables  (but  must  not  have  output 
variables).  Lucid-box  compositions  will  serve  as  the  operation  of  this 
algebra  since  they  allow  for  producing  a  procedure  out  of  existing 
ones.  This  will  be  analogous  to  an  algebra  of  algorithms  that  can  be 
defined  using  black-box  compositions.  We  do  not  elaborate  on  this  idea 
here  and  leave  it  for  future  research. 

4.  On_  Structured  Programming. 

A  growing  number  of  programs,  documentation  of  programs  and 
presentations  of  algorithms  in  the  literature  use  principles  of 
stiructured  programming  for  modularity  and  clarity  of  exposition. 

A  methodologically  useful  way  to  present  algorithms  in  this  spirit 
is  to  start  with  a  high-level  description  of  the  algorithm  which 
includes  a  milestones  in  their  performance.  In  a  hierarchical  fashion 
this  overview  is  filled  with  details,  until  a  complete  low-level 
accurate  specification  is  obtained.  For  instance,  take  the 
biconnectivity  algorithm  using  depth-first-search  (DFS)  in  [AHU]. 
After  presenting  the  DFS  method  for  searching  a  graph  (to  which  we 
refer  as  a  high-level  description)  the  book  presents  again  he  DFS  now 
intermixed  with  the  bookkeeping  required  for  the  biconnectivity 
algorithm.  An  alternative  approach  that  we  would  like  to  point  out 
looks  at  the  DFS  as  a  navigator  that  tells  us  where  to  go  next,  while 
there  is  a  bookkeeper  that  should  be  told  where  we  are,  in  order  to  be 
able  to  do  the  necessary  bookkeeping.  We  suggest  the  following 
description.  There  is  a  'general  manager'  that  coordinates  between  the 
navigator  ('president  of  the  company')  and  the  bookkeeper.  The  general 
manager  asks  the  president  where  to  go,  and  then  transmits  the  response 
to  the  bookkeeper.  The  bookkeeper  writes  down  whatever  is  required  and 
reports  to  the  general  manager  when  he  is  done.  Then  the  general 
manager  consults  the  president  again,  and  so  on. 

The  modularity  of  the  presentation  is  increased  since  now  the 
high-level  description  is  an  independent  piece  of  software.  Besides 
the  importrant  advantage  of  clarity  we  would  like  to  point  out  the 
following  possible  advantage:  In  many  cases  a  first  version  of  an 
algorithm  is  improved  later  by  polishing  its  low-level  implementation 
(bookkeeping)  only.   It  is  desirable   in  such  cases   to  utilize   the 
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previous   fault-free  high-level  specifications.   If  it  is  written  as  an 
independent  unit  it  can  be  used  as  is. 

[K]  mentions  the  methodological  importance  of  understanding 
several  coroutines  as  (symmetric)  equal  partners,  unlike  a  master-slave 
relation  between  a  program  and  a  subroutine  it  calls.  The  above 
hierarchical  description  may  well  fit  this  understanding  by  naming  a 
new  general  manager  who  employs  all  coroutines  involved.  So  no 
coroutine  dominates  another. 

Appendices 

Appendix  I.  Choice  of  a  model  of  parallel  computation. 

The  paper  [V2]  deals  with  the  problem  of  choosing  a  theoretical 
abstract  model  of  parallel  computation  to  be  simulated  by  synchronous 
distributed  machines.  The  principle  of  choosing  the  most  permissible 
model  of  parallel  computation  as  long  as  the  cost  of  computational 
resources  does  not  increase  is  applied. 

We  sketch  the  main  ideas  in  this  paper  emphasizing  the  ones  that 
relate  to  lucid-box  compositions. 

Our  machine  is  assumed  to  be  represented  by  a  model  of  synchronous 
distributed  computations  (SDC).  It  employs  a  sequence  of  RAM's  (see 
[AHU])  PpP2,...,P|j  that  operate  synchronously  in  parallel.  Each 
processor  can  communicate  directly  with  no  more  than  c  other  processors 
(where  c  is  some  "small"  constant)  through  communication  lines. 
Communication  registers  which  are  associated  with  the  lines  are  used 
for  the  communication. 

The  concurrent-read  exclusive-write  parallel  RAM  (CREW  PRAM)  is  a 
synchronous  model  of  parallel  computation  in  which  all  p  processors 
Pp...,Pp  have  access  to  a  shared  memory.  Simultaneous  access  to  the 
same  common  memory  location  is  allowed  for  read  (but  not  write) 
purposes.  The  Fetch-and-Add  (F&A)  PRAM  model  of  synchronous  parallel 
computation  allows  every  operation  which  is  permitted  by  the  CREW  PRAM. 
In  addition,  the  following  is  allowed.  Say  that  k  processors  ii,...,ii^ 
execute  simultaneously  the  instruction  F&A(A,e,),  ...,  F&A(A,e,^), 
respectively;  where  A  is  a  common  memory  address  and  e-  is  some  address 
in  the  local  memory  of  processing  i.  ,  for  1  <  j  <  k.  The  result  will 
be  as  follows: 
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l.A-^A+ej+e2  +  ...+ej^. 

2.  For  some  permutation  a   of  {l,2,...,k}   (which  is  not  known  in 

advance)  the  k  sums 


A,  A+e^(i),  A+e^(l)  +^a(2)'  ••"  ^+^0(1)  +^a(2)  •••  +^a(k-l) 

are  stored  in  local  registers  of  processors  ^n  n  )  »^a  (2) '  •**'  ■'"aCk")  » 
respectively.  Generally  speaking  the  main  result  of  the  paper  is  that 
every  simulation  of  algorithms  given  in  the  CREW  PRAM  into  the  SDC  has 
a  counterpart  simulation  for  algorithms  given  in  the  F&A  PRAM  into  the 
SDC  which  requires  the  same  SDC-time  and  SDC-local  memories  up  to  a 
constant  factor.  This  supports  the  choice  of  the  F&A  PRAM  as  an 
abstract  model  of  parallel  computation  as  was  done  for  the 
NYU-Ultracomputer  [GGKMRS]. 

We  sketch  the  main  ideas  that  relate  to  lucid-box  compositions. 
Let  us  take  a  pulse  of  the  F&A  PRAM.  Assume,  that  in  this  pulse 
processor  ipi2,...,ij^  want  to  execute  F&A(Aj^  ,e  j^  )  ,  F&A(A2  ,e2) ,  ■  • .  , 
F&A(A,  ,e|^),  respectively,  where  A.  is  a  common  memory  address  and  e-  is 
some  address  in  the  local  memory  of  processor  i.  ,  for  1  <  j  <  k.  All 
other  processors  of  the  F&A  PRAM  are  assumed  to  remain  idle  during  the 
present  pulse.  Finally,  we  arrived  at  the  point  where  the  present 
example  relates  to  lucid-box  compositions.  We  'replace'  the  given 
pulse  by  the  following:  every  F&A(A.,e-)  instruction  for  processor  j  is 
replaced  by  the  insruction  "read  common  address  A.  into  local  address 
e-  of  processor  j",  for  1  <  j  <  k.  We  apply  the  simulation  of  the  CREW 
PRAM  into  the  SDC  in  order  to  simulate  this  'reading'  pulse.  This 
simulation  is  done  in  auxiliary  memory  locations  of  the  SDC  and  all 
cases  where  the  contents  of  a  (memory  or  communication)  location  of  the 
SDC  is  copied  into  another  are  stored  in  the  local  memory  of  the 
processor  that  does  it.   The  correctness  of  the  simulation  implies  that 


simulates  memory  location  A.)  is  propagated  from  A.  to  e •  between  the 
time  the  simulation  of  the  present  reading  pulse  begins  until  the  time 
it  ends.  Say  that  it  takes  T  cycles  of  the  SDC  machines.  We  assume 
that  this  propagation  occurs  by  repeatedly  copying  the  contents   of  A. 
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from  one  SDC  memory  location  to  another  until  it  is  finally  copied  into 
e..  (No  splitting  of  the  bits  of  this  contents  or  encoding  of  contents 
of  several  locations  into  one  are  allowed.)  This  is  called  the  copying 
assumption. 

Let  A  be  the  set  of  all  memory  locations  (in  all   local  memories) 
of    the   SDC.    The   following   (layered)   directed   graph   G(V,E)   is 
introduced   as   a   tool   for   specifying   the   synchronization   of   the 
remaining   steps;   edges  between  two  successive   layers   represent 
simultaneous  events  that  take  place  between  two  successive  ticks  of  our 
'clock'.   The  set  of  vertices  V  of  G  is  V  =  {(a,t)  |  a  e  A,  0  <  t  <  T} . 
Layer  t,  L^.  ,  of  G  is  L^.  =  {(a,t)  |  a  e  A}  for  t,  0  <  t  <  T.  Let 
E^  =  {((a,t-l),  (b,t)]  I  A  (SDC)  processor  P^  ,  for  some  1  <  i  <  s, 
copied  the  contents  of  address  a  into 
address  b  at  time  unit  t,  1  <  t  <  t} 
E2  =  {[(a,t-l),  (a,t)]  I  No  processor  wrote  into  address  a  at 
time  unit  t,  1  <  t  <  T} 
The  set  of  edges  E  is  the  union  of  E,  and  Eo  .   Note,  that  all  edges  of 
Ej^   were   actually   recorded   at   the   local   memories  of  corresponding 
processors. 

It  is  not  difficult  to  observe  that  binary  trees  with  A.  as  roots 
and  e.  as  edges  are  subgraphs  of  G.  The  simulation  of  the  F&A 
instructions  proceeds  by  identifying  these  trees  and  implementing  a 
synchronous  partial-sums  computation  on  them  in  order  to  satisfy  the 
F&A  instructions.  The  following  facts  follow  readily:  the  time 
required  for  the  simulation  is  0(T);  and,  if  processors  Pi»P2»*"^d 
employed  local  memories  of  sizes  mpra2,...,mj  ,  respectively,  for  the 
simulation  of  the  reading  pulse  then  they  employ  local  memories  of  size 
0(m^+T),  0(m2+T),  ...,  0(m^+T),  respectively,  for  the  simulation  of  the 
pulse  that  involved  F&A  instructions. 

The  refined  corespondence  between  many  performance  parameters  of 
an  existing  procedure  and  corresponding  parameters  of  a_  new  procedure 
seems  to  be  relevant  for  many  computational  environments:  For  instance , 
an  algorithm  for  a  synchronous  parallel  shared  memory  model  of 
computation  may  be  evaluated  by  its  number  of  steps,  size  of  each  of 
its  local  and  common  memories,  number  of  accesses  to  the  shared  memory 
and  their  frequency  and  many  more  parameters.   Another  example  is  an 
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algorithm  for  synchronous  distributed  machines.  They  may  be  evaluated 
by  communication  load  on  each  line,  sizes  of  local  memories  and  number 
of  steps. 

The  composition  of  the  simulation  of  the  pulse  that  involves  F&A 
instructions  using  execution  of  the  simulation  of  the  reading  pulse 
gave  us  a  many  parameter  correspondence  between  the  two  simulations. 
The  notion  of  lucid-box  compositions  allows  for  a  closer  adherence  to 
the  existing  procedure  and  enables  us  to  use  information  which  one 
cannot  expect  to  find  in  typical  predeclared  outputs.  Therefore,  we 
claim  that  lucid-box  compositions  are  tailored  for  multi-parameter 
optimization  of  algorithms. 


Appendix  II.  A  Parallel-Design  Distributed-Impleraentation 
General-Purpose  Computer. 

The  paper  [VI]  introduces  a  scheme  of  an  efficient  general-purpose 
parallel  computer.  Its  design  space  (i.e.  the  model  for  which 
parallel  programs  are  written),  is  a  slightly  more  permissive  model  of 
computation  than  the  F&A  PRAM.  However,  in  order  to  simplify  this 
presentation  assume  that  the  F&A  PRAM  is  the  design  space  of  our 
computer.  The  implementation  space  is  presented  as  a  scheme  of  an  SDC 
in  which  each  processor  may  communicate  with  <  4  others.  We  sketch 
below  how  lucid-box  compositions  may  help  in  the  construction  of  an 
efficient  translation  of  the  design  space  into  the  implementation 
space. 

The  F&A  PRAM  employs  p  processors  l,2,...,p  and  m  common  memory 
addresses  l,2,...,m.  The  SDC  employs  d  (<  p)  strong  processors  (called 
super-processors);  each  of  them  is  responsible  to  simulate  the  behavior 
of  about  p/d  F&A  PRAM  processors.  It  also  employs  'comparator 
processors'  that  behave  similarly  to  comparator  modules  of  comparison 
networks  and  m  'memory  processors',  each  of  them  is  responsible  for 
simulating  the  behavior  of  one  common  memory  location.  We  present  the 
solution  only  for  a  pulse  of  the  F&A  PRAM  of  the  following  form. 
Assume  that  processors  ipi2,...,i^  want  to  execute  F&A(A^,e]^), 
F&A(A2,e2),  ...,  F&A(A,  ,ei^)  where  A-  is  a  common  memory  address  and   e^ 
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is   an   address   of   the   local  memory  of  processor  i.  ,  1  <  j  <  p.  All 

other  processors  of  the  F&A  PRAM  remain  idle  during  the  present  pulse. 

The  translation  into   the   SDC   proceeds   in   a   cycle.    The   SDC 

consists   of   a   sorting   network   followed   by  a  merging  network  as  in 

Figure  1.  At  the  j-th  cycle  super-processor  i  simulates  the  behavior  of 

processor  i+(j-l)d,  l<:i<d,  l<j<  ||-|.   Let  us  describe  now  the 

d 
first  cycle.   The  F&A  instructions  of  processors  i.  such  that  1  <  i-  < 

d,   for   1  <   j  <   k  are  being  simulated.   Denote  these  processors  by 

i]^  ,i2,  •  •  •  ,i|^  for  some  k^    ,  w.l.g. 

(1)  By  a  (any)  sorting  network,  we  sort  the  pairs   (A.,i.),   for   j   = 

l,...,k,  ,   according   to  a  lexicographic  order.   Comparator  processors 

serve  as  comparator  modules.   They  also  transmit  the  contents  of  the  e. 

cells   and  keep   records   of  their  activities  at  each  time  unit  of  the 

sorting.   The  output  sorted  list  contains,  in  successive  locations,  F&A 

instructions  that  relate  to  the  same  common  memory  location. 

The   records  which  are  kept  through  the  sorting  algorithm  will  be 

used  late  in  order  to  send  back  to  the  super-processor  messages   which 

correspond   to   ones   that  were  forwarded  through  the  sorting  network. 

Here,  we  already  observe  some  form  of   lucid-box  composition  where   a 

(not   fully  specified)   sorting  network   is  being  used  for  another 

procedure.   However,  the  more  intriguing  use  of  lucid-box  compositions 

is  being  made  in  the  next  step. 

2.   This  output   sorted  list  and  the  (sorted)  list  of  memory  addresses 

are  merged  by  a  (any)  merging  network. 

The  interesting  point  is  that  we  are  not  interested  in  the  result  of 
the  merging  itself.  For  each  common  memory  address  A.  such  that  an  F&A 
instruction  which  relates  to  it,  is  being  simulated  in  the  first  cycle 
let  us  define  the  following  directed  graph.  Each  line  of  the  merging 
network  that  transmits  either  the  address  A.  itself  or  an  F&A 
instruction  that  relates  to  it  corresponds  to  edges  of  this  graph. 
Each  end  point  of  such  an  edge  is  a  vertex  of  the  graph.  The  paper 
proves  the  following  property  of  merging  networks. 

The   digraph   of   A.   contains   the   following   rooted   tree   as  a 
subgraph: 
(1)  The  memory  processor  corresponding  to  A.  is  the  root  of  the  tree. 
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(2)  All  outputs  of  the  merging  network  that  receive  an  F&A  instruction 
which  relates  to  A.  are  leaves  of  this  tree. 

3.  The  partial  sums  needed  for  the  simulation  of  the  F&A  instructions 
are  compued  by  moving  synchronously  from  the  leaves  of  each  tree  (that 
was  described  above)  to  its  root  and  back  to  the  leaves.  This 
correspnds  to  a  move  from  right  to  left  in  Figure  1  followed  by  a  move 
from  left  to  right. 

4.  The  partial  sums  are  sent  through  the  merging  and  sorting  networks 
back  to  the  super-processors. 

Each  of  the  following  cycles  is  being  processed  in  the  same  way. 

A  lucid-box  composition  (in  the  wide  sense  of  distributed 
computation)  using  sorting  and  merging  networks  is  exercised  in  the 
design  of  both  the  architecture  of  the  SDC  and  its  operation.  It  is 
interesting  to  note  here  the  importance  of  some  of  the  effects  which 
are  not  included  in  the  input  or  output  specifications  of  the  merging 
and  sorting  problems.  This  applies  to  a  wider  semantical  question  of 
what  does  the  term  'output  of  procedure'  mean. 

Let  us  go  back  to  the  translation  procedure.  Note  that  several 
cycles  may  have  F&A  instructions  that  relate  to  the  same  common  memory 
address.  Each  of  the  trees  that  correspnds  to  a  specific  common  memory 
address  in  various  cycles  is  rooted  at  its  memory-processor.  This 
enables  pipelining  of  cycles  with  constant  time  delays  between 
successive  ones.  This  memory-processor  serves  as  the  link  in  the  chain 
between  successive  cycles  that  relate  to  the  common  memory  location. 
The  paper  elaborates  more  on  this  point.  The  following  theorem  can  be 
finally  stated  about  the  PDDI  General  Purpose  Computer.  Let  f(s,ra) 
(resp.  Jl(s,m))  be  the  sum  of  the  number  of  comparator  processors 
(resp.  longest  directed  paths)  in  a  selected  sorting  network  of  s 
elements  and  a  selected  merging  network  of  lists  of  length  s  and  m. 

Theorem.   Given  an  algorithm  with  time  0(_)  for  all  p  <  x  in  a  F&A  PRAM 

P 
with  p  processors  and  m  common  memory  locations  where  t,  x  and   m  are 

some   numbers.    We   can   simulate  it  in  SDC  with  s  super-processors,  m 

memory-processors  and  f(s,ra)  comparator-processors  in  time  0(_)  for  s  < 

x/£  (s  ,m). 
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Remark.  For  Batchers'  sorting  and  merging  networks  we  get  i(s,in)  = 
OClog^s  +  log  m)  and  f(s,in)  =  0(s  log^s  +  m  log  ra). 

Replacing  this  sorting  network  by  the  one  suggested  in  [AKS] 
results  in  replacing  log^s  by  log  s  which  is  very  favorable.  However 
large  constants  will  have  to  be  taken  into  account  in  this  case. 

This  result  compares  favorably  with  other  related  results.  It 
employs  less  auxiliary  processors  than  [Ec]:  solves  a  wider  problem 
than  [GF]  and  improves  on  it  for  the  restricted  problem  that  they  both 
solve.  The  proximity  of  our  machine  to  sorting  and  merging  networks 
may  enable  us  to  take  advantage  of  the  richness  of  accomplishments 
regarding  layouts  and  other  implementation  notions  of  such  networks  in 
various  technologies. 

Appendix  III.   Parametrized  Computing 

The  topic  of  Parametrized  Comuting  was  initiated  by  N.  Megiddo  in 
[Megl].  We  will  not  elaborate  here  on  Parametrized  Computing  much 
beyond  the  examle  given  in  the  introduction.  The  reader  is  invited  to 
embody  any  of  our  declarations  regarding  Parametrized  Computing  on  this 
example.  An  abstract  example  which  is  similar  to  some  extent  to  the 
one  given  below  can  be  found  in  [Meg2].  Suppose  that  F(X)  is  a 
monotone  function  of  the  real  variable  X  and  problem  A  is  to  evaluate  F 
at  a  given  X.  Suppose  that  problem  B  is  to  solve  the  equation  F(X)  = 
0.  Let  us  retrict  ourselves  to  algorithms  for  problem  A  that  satisfy 
the  following:  throughout  their  execution  the  variable  X  is  involved 
only  in  additions,  comparisons  and  multiplications  by  constants.  It  is 
typical  to  Parametrized  Computing  to  construct  an  efficient  algorithm 
for  problem  B  by  a  lucid-box  composition  that  employs  an  efficient 
algorithm  for  problem  A.  The  specification  of  this  construction  does 
not  typically  need  a  full  specification  of  the  algorithm  for  problem  A. 
The  correspondence  between  the  efficiency  criteria  for  the 
(hypothetical)  algorithm  for  A  and  the  efficiency  criteria  for  the 
algorithm  for  B  is  sometimes  simple.  For  instance,  the  example  which 
is  given  in  the  introduction  applies  the  same  efficiency  criteria  (time 
or  space  complexity)  in  order  to  measure  the  performance  of  both  the 
median  algorithm  and  the  parametrized  median  algorithm.  However, 
[Meg2]   proposes   a   subtler   correspondence.    In  some   cases   a  fast 
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paraliel  median  median  algorithm  that  uses  a  small  number  of  processors 
for  A  implies  a  fast  sequential  algorithm  for  B.  Assume,  for  instance, 
that  we  had  a  parallel  median  algorithm  that  runs  in  T,(n)  cycles  using 
PiCn)  processors.  We  simulate  this  (hypothetical)  algorithm  serially 
by  two  loops:  (a)  An  external  loop  is  responsible  for  simulating  the 
cycles  in  order  (called  'cycle  loop'). 

(b)  An  internal  loop  is  responsible  for  simulating,  in  a  given  cycle, 
the  processors  in  some  order. 

Each  cycle  may  contain  up  to  Pj^(n)  comparisons  beteen  input  elements 
for  the  median  algorithm.  (1)  In  each  such  case  the  intersection  point 
between  a  corresponding  pair  of  f(X)  functions  is  computed.  (2)  The 
median  of  these  instersection  points  is  computed  by  the  linear-time 
sequential  algorithm  in  0(Pj^(n))  time.  (3)  F  is  evaluated  (in  0(n) 
time)  at  this  median  intersection  point.  (4)  The  interval  'that  must 
include  \  '  is  properly  chopped.  This  enables  us  to  (5)  evaluate  the 
result  of  the  comparison  that  should  be  fed  into  half  of  the  <  Pi(n) 
comparisons  of  the  present  cycle  (i.e.,  the  comparison  whose 
intersection  points  relate  to  this  median  intersection  point  as  the 
chopped  portion  of  the  interval)  in  0(P,(n)  time.  We  compute 
repeatedly  steps  (2)  through  (4)  for  the  remaining  intersection  points. 
The  rest  is  similar  to  the  use  of  the  sequential  median  algorithm  given 
above.  The  total  number  of  operations  required  for  steps  (1),  (2),  (4) 
or  (5)  is  0(P^(n)T^(n)).  Step  (3)  requires  0(T^(n)«n  log  Pi(n)).  The 
parallel  sorting  algorithm  of  [AKS]  requires  n  processors  and  O(log  n) 
time.  If  we  use  it  in  the  obvious  way  as  our  parallel  median  algorithm 
we  get  0(n  log"n)  time  algorithm  for  the  parametrized  median  problem. 

We  finish  this  section  by  mentioning  more  works  that  applied  the 
notions  of  Parametrized  Computing  for  solving  problems  taken  from 
different  fields  (such  as  networks,  scheduling,  location,  geometry  and 
statistics)  [CD],  [Gul],  [Gu2],  [UN],  [L]  ,  [Meg4],  [Meg5],  [MTl]  and 
[MT2].  See  [Meg3]  for  an  example  of  the  effectiveness  of  a  repeated 
application  of  lucid-box  compositions. 
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Figure  1.   The  network  of  the  implementadon 
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